FlashPCA2: principal component analysis of Biobank-scale genotype datasets

نویسندگان

  • Gad Abraham
  • Yixuan Qiu
  • Michael Inouye
چکیده

Motivation Principal component analysis (PCA) is a crucial step in quality control of genomic data and a common approach for understanding population genetic structure. With the advent of large genotyping studies involving hundreds of thousands of individuals, standard approaches are no longer feasible. However, when the full decomposition is not required, substantial computational savings can be made. Results We present FlashPCA2, a tool that can perform partial PCA on 1 million individuals faster than competing approaches, while requiring substantially less memory. Availability and implementation https://github.com/gabraham/flashpca . Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Representing Spectral data using LabPQR color space in comparison to PCA method

In many applications of color technology such as spectral color reproduction it is of interest to represent the spectral data with lower dimensions than spectral space’s dimensions. It is more than half of a century that Principal Component Analysis PCA method has been applied to find the number of independent basis vectors of spectral dataset and representing spectral reflectance with lower di...

متن کامل

بررسی اثر متقابل ژنوتیپ × محیط و پایداری عملکر‌‌د برگ خشک ژنوتیپ‌های توتون ویرجینیا با استفاده از رگرسیون تای (Tai) وAMMI

To determine the yield stability and adaptability of the genotype of tobacco, 5 genotypes of flue-cured tobacco were evaluated in experiment using a randomized completely block design (RCBD) with three replications at two locations including Rasht and Tirtash tobacco Research Centers (IRAN), during the growing season of 2008-2010 (four environment). The interaction of genotype × environment in ...

متن کامل

مطالعه پایداری عملکرد در ارقام آفتابگردان با استفاده از روش AMMI

Genotype-environment interaction for plant breeders has been important as it is a complex issue in breeding for high yield varieties and releasing new genotypes. In order to assess adaptability and stability of sunflower varieties in different climate conditions, twelve cultivars were investigated in Karaj, Shiraz, Birjand, Kashmar and Arak in randomized complete block designs with three replic...

متن کامل

Controlling for population structure and genotyping platform bias in the eMERGE multi-institutional biobank linked to electronic health records

Combining samples across multiple cohorts in large-scale scientific research programs is often required to achieve the necessary power for genome-wide association studies. Controlling for genomic ancestry through principal component analysis (PCA) to address the effect of population stratification is a common practice. In addition to local genomic variation, such as copy number variation and in...

متن کامل

Accurate Genomic Prediction Of Human Height

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ∼40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 33 17  شماره 

صفحات  -

تاریخ انتشار 2017